# Lecture 18: Pipelining

- Today's topics:
  - 5-stage pipeline
  - Hazards
  - Data dependence handling with bypassing
  - Data dependence examples

## Performance Improvements?

- Does it take longer to finish each individual job?
- Does it take shorter to finish a series of jobs?
- What assumptions were made while answering these questions?
  - No dependences between instructions
  - Easy to partition circuits into uniform pipeline stages
  - No latch overhead
- Is a 10-stage pipeline better than a 5-stage pipeline?

### **Quantitative Effects**

- As a result of pipelining:
  - Time in ns per instruction goes up
  - Each instruction takes more cycles to execute
  - But... average CPI remains roughly the same
  - Clock speed goes up
  - Total execution time goes down, resulting in lower average time per instruction
  - Under ideal conditions, speedup
    - = ratio of *elapsed times between successive instruction* completions
    - = number of pipeline stages = increase in clock speed

#### Hazards

- Structural hazards: different instructions in different stages (or the same stage) conflicting for the same resource
- Data hazards: an instruction cannot continue because it needs a value that has not yet been generated by an earlier instruction
- Control hazard: fetch cannot continue because it does not know the outcome of an earlier branch – special case of a data hazard – separate category because they are treated in different ways

#### Conflicts/Problems

- I-cache and D-cache are accessed in the same cycle it helps to implement them separately
- Registers are read and written in the same cycle easy to deal with if register read/write time equals cycle time/2
- Instructions can't skip the DM stage, else conflict for RW
- Consuming instruction may have to wait for producer
- Branch target changes only at the end of the second stage
   -- what do you do in the meantime?

#### Structural Hazards

- Example: a unified instruction and data cache 
   stage 4 (MEM) and stage 1 (IF) can never coincide
- The later instruction and all its successors are delayed until a cycle is found when the resource is free 
   I these
   are pipeline bubbles
- Structural hazards are easy to eliminate increase the number of resources (for example, implement a separate instruction and data cache, add more register ports)

#### **Data Hazards**

- An instruction produces a value in a given pipeline stage
- A subsequent instruction consumes that value in a pipeline stage
- The consumer may have to be delayed so that the time of consumption is later than the time of production

รอผลลัพธ์ที่ตัวอื่นยังคำนวนไม่เสร็จ

### Example 1 – No Bypassing

Show the instruction occupying each stage in each cycle (no bypassing)
 if I1 is R1+R2=R3 and I2 is R3+R4=R5 and I3 is R7+R8=R9

| CYC-1 | CYC-2 | CYC-3 | CYC-4 | CYC-5 | CYC-6 | CYC-7 | CYC-8 |
|-------|-------|-------|-------|-------|-------|-------|-------|
| IF    |
| D/R   |
| ALU   |
| DM    |
| RW    |

# Example 1 – No Bypassing

Show the instruction occupying each stage in each cycle (no bypassing)
 if I1 is R1+R2=R3 and I2 is R3+R4=R5 and I3 is R7+R8=R9

=1.66



Show the instruction occupying each stage in each cycle (with bypassing) if I1 is R1+R2=R3 and I2 is R3+R4=R5 and I3 is R3+R8=R9.
 Identify the input latch for each input operand.

ปกติ เราจะอ่านค่าที่มัน dependence จากตัว register คือตัวอื่นต้องเขียนไปก่อนถึงจะอ่านได้ <mark>แต่ bypassing คืออ่านค่าดักจาก latch ก่อนที่จะถูกเขียนลง register เสียอีก</mark>

| CYC- |   |     | 0   |     |     | . • |     | С-7 СҮС-8 |    |
|------|---|-----|-----|-----|-----|-----|-----|-----------|----|
| IF   |   | IF        |    |
| D/I  | 3 | D/R       |    |
| AL   | J | ALU       |    |
| DN   | 1 | DM        |    |
| RV   | J | RW        | 10 |

### Example 2 – Bypassing

Show the instruction occupying each stage in each cycle (with bypassing) if I1 is R1+R2=R3 and I2 is R3+R4=R5 and I3 is R3+R8=R9.
 Identify the input latch for each input operand.





#### Problem 2



13

#### **Problem 3**

